--- title: Exploring Object Detection using Icevision w/ FastAI keywords: fastai sidebar: home_sidebar nb_path: "20_subcoco_ivf.ipynb" ---
{% raw %}
/usr/local/lib/python3.8/dist-packages/graphql/type/directives.py:55: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  assert isinstance(locations, collections.Iterable), 'Must provide locations for directive.'
/usr/local/lib/python3.8/dist-packages/graphql/type/typemap.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  from collections import OrderedDict, Sequence, defaultdict
{% endraw %} {% raw %}
/home/brian/.local/lib/python3.8/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
{% endraw %}

Download a Sample of COCO Data

The full COCO Dataset is huge (~50GB?). For my self education exploring object detection, with the intention of using pretrained model in transfer learning, it is not practical to deal with dataset this big as my first project. Luckily, the kind folks at FastAI have prepared some convenient subsets, the medium size 3GB https://s3.amazonaws.com/fast-ai-coco/coco_sample.tgz seems like a good candidate. The 800KB "http://files.fast.ai/data/examples/coco_tiny.tgz" on the other hand seems way too small, thus may not have enough data for adequate training.

{% raw %}
train_json = fetch_subcoco(datadir='/tmp', url='http://files.fast.ai/data/examples/coco_tiny.tgz', img_subdir='train')
img_dir='/tmp/coco_tiny/train'

#train_json = fetch_subcoco(datadir='workspace', url='https://s3.amazonaws.com/fast-ai-coco/coco_sample.tgz', img_subdir='train_sample')
#img_dir='workspace/coco_sample/train_sample'
{% endraw %}

If using tiny Coco subset, use these values:

train_json = fetch_subcoco(url='http://files.fast.ai/data/examples/coco_tiny.tgz', img_subdir='train')

If using Sample Coco subset, use these values:

train_json = fetch_subcoco(url='https://s3.amazonaws.com/fast-ai-coco/coco_sample.tgz', img_subdir='train_sample')

Check Annotations

Let's load and inspect the annotation file that comes with the coco tiny dataset...

{% raw %}
train_json['categories'], train_json['images'][0], [a for a in train_json['annotations'] if a['image_id']==train_json['images'][0]['id'] ]
([{'id': 62, 'name': 'chair'},
  {'id': 63, 'name': 'couch'},
  {'id': 72, 'name': 'tv'},
  {'id': 75, 'name': 'remote'},
  {'id': 84, 'name': 'book'},
  {'id': 86, 'name': 'vase'}],
 {'id': 318219, 'file_name': '000000318219.jpg'},
 [{'image_id': 318219,
   'bbox': [505.24, 0.0, 47.86, 309.25],
   'category_id': 72},
  {'image_id': 318219,
   'bbox': [470.68, 0.0, 45.93, 191.86],
   'category_id': 72},
  {'image_id': 318219,
   'bbox': [442.51, 0.0, 43.39, 119.87],
   'category_id': 72}])
{% endraw %}

Digest the Dataset for useful Stats

Do some basic analysis of the data to get numbers like total images, boxes, and average box count per image...

{% raw %}
stats = load_stats(train_json, img_dir=img_dir, force_reload=False)
print(
    f"Categories {stats.num_cats}, Images {stats.num_imgs}, Boxes {stats.num_bboxs}, avg (w,h) {(stats.avg_width, stats.avg_height)}"
    f"avg cats/img {stats.avg_ncats_per_img:.1f}, avg boxs/img {stats.avg_nboxs_per_img:.1f}, avg boxs/cat {stats.avg_nboxs_per_cat:.1f}.")

print(f"Image means by channel {stats.chn_means}, std.dev by channel {stats.chn_stds}")
stats.num_imgs, stats.lbl2name, stats.lbl2cat, stats.cat2lbl, stats.lbl2name
Categories 6, Images 21837, Boxes 87106, avg (w,h) (575.6857626963424, 481.71420066859)avg cats/img 7.0, avg boxs/img 4.0, avg boxs/cat 14517.7.
Image means by channel [115.64436835 103.2992867   91.73613059], std.dev by channel [64.16724017 62.63021182 61.92975836]
(21837,
 {1: 'chair', 2: 'couch', 3: 'tv', 4: 'remote', 5: 'book', 6: 'vase'},
 {1: 62, 2: 63, 3: 72, 4: 75, 5: 84, 6: 86, 0: 0},
 {62: 1, 63: 2, 72: 3, 75: 4, 84: 5, 86: 6, 0: 0},
 {1: 'chair', 2: 'couch', 3: 'tv', 4: 'remote', 5: 'book', 6: 'vase'})
{% endraw %}

Custom Parser for Icevision

{% raw %}

box_within_bounds[source]

box_within_bounds(x, y, w, h, width, height, min_margin_ratio, min_width_height_ratio)

{% endraw %} {% raw %}

class SubCocoParser[source]

SubCocoParser(stats:CocoDatasetStats, min_margin_ratio=0, min_width_height_ratio=0, quiet=True) :: Parser

Base class for all parsers, implements the main parsing logic.

The actual fields to be parsed are defined by the mixins used when defining a custom parser. The only required field for all parsers is the image_id.

Examples

Create a parser for image filepaths.

class FilepathParser(Parser, FilepathParserMixin):
    # implement required abstract methods
{% endraw %} {% raw %}
{% endraw %}

Load Data Using Custom Parser

To prevent bounding boxes being too close to margin or too small, especially after augmentation which performs transformations. I would set min_margin_ratio = 0.05, min_width_height_ratio = 0.05.

However, IceVision 2.0 now has autofix which should address these issues, it does take a long time to run though...

{% raw %}

parse_subcoco[source]

parse_subcoco(stats:CocoDatasetStats)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
train_records, valid_records = parse_subcoco(stats)
Skipped 5362 out of 21837 images

{% endraw %}

shows images with corresponding labels and boxes

{% raw %}
class_map = ClassMap(list(stats.lbl2name.values()))
show_records(train_records[:4], ncols=2, class_map=class_map, show=True)
{% endraw %}

Custom FastAI Callback to Include Metric in Save Model Filename

{% raw %}

class SaveModelDupBestCallback[source]

SaveModelDupBestCallback(monitor='valid_loss', comp=None, min_delta=0.0, fname='model', every_epoch=False, with_opt=False, reset_on_fit=True) :: SaveModelCallback

Extend SaveModelCallback to save a duplicate with metric added to end of filename

{% endraw %} {% raw %}
{% endraw %}

How can I unit test this? Copy idea from FastAI Callback

Create Transforms, Model, Training and Validation Dataloaders, Learners

  • Define transforms - using Albumentations transforms out of the box.
  • Use them to construct Datasets and Dataloaders.
  • Make a Learner
{% raw %}

gen_transforms_and_learner[source]

gen_transforms_and_learner(stats:CocoDatasetStats, train_records:List[BaseRecord], valid_records:List[BaseRecord], img_sz=128, bs=4, acc_cycs=8, num_workers=2)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
img_sz, bs, acc, workers, head_runs, full_runs = 128, 4, 8, 1, 1, 1
#img_sz, bs, acc, workers, head_runs, full_runs = 512, 4, 8, 4, 20, 200
inf_tfms, learn, backbone_name = gen_transforms_and_learner(stats, train_records, valid_records, 
                                                            img_sz=img_sz, bs=bs, acc_cycs=acc, num_workers=workers)
{% endraw %}

I have experimented with other models available out of box in IceVision, but efficientdet works the best. You can replace backbone_name, backbone, model, with the following values to test.

backbone_name

  • "resnet_fpn.resnet18"

backbone

  • backbones.resnet_fpn.resnet18(pretrained=True)
  • backbones.resnet_fpn.resnet34(pretrained=True)
  • backbones.resnet_fpn.resnet50(pretrained=True) # Default
  • backbones.resnet_fpn.resnet101(pretrained=True)
  • backbones.resnet_fpn.resnet152(pretrained=True)
  • backbones.resnet_fpn.resnext50_32x4d(pretrained=True)
  • backbones.resnet_fpn.resnext101_32x8d(pretrained=True)
  • backbones.resnet_fpn.wide_resnet50_2(pretrained=True)
  • backbones.resnet_fpn.wide_resnet101_2(pretrained=True)

model

  • faster_rcnn.model(backbone=backbone, num_classes=len(stats.lbl2name))

Train using FastAI

{% raw %}
# if torch.cuda.is_available():
#     learn.lr_find()
{% endraw %} {% raw %}

run_training[source]

run_training(learn:Learner, min_lr=0.05, head_runs=1, full_runs=1)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
if torch.cuda.is_available():
    run_training(learn, min_lr=0.01, head_runs=head_runs, full_runs=full_runs)
Training for 20+200 epochs at min LR 0.01
epoch train_loss valid_loss COCOMetric time
0 38.005032 33.342953 0.000074 31:38
1 2.939105 2.584084 0.002158 31:53
2 1.297279 1.191830 0.030757 30:50
3 1.032163 0.913825 0.104741 31:05
4 0.898730 0.783614 0.221129 31:16
5 0.839787 0.717855 0.282516 31:33
6 0.808806 0.727686 0.284744 31:23
7 0.779200 0.692641 0.303767 31:31
8 0.809072 0.691453 0.306420 31:24
9 0.805325 0.670977 0.320615 31:37
10 0.764411 0.677121 0.307504 31:24
11 0.743948 0.645292 0.339091 31:37
12 0.744827 0.654019 0.325625 32:18
13 0.727114 0.637847 0.343310 31:55
14 0.730767 0.632273 0.358012 32:34
15 0.752622 0.641926 0.334332 31:59
16 0.720990 0.628331 0.340647 32:05
17 0.704894 0.621887 0.345080 31:55
18 0.693803 0.628212 0.328261 32:04
19 0.691504 0.609884 0.360236 32:20
Better model found at epoch 0 with COCOMetric value: 7.351462309852022e-05.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@0_COCOMetric=0.000.pth
Better model found at epoch 1 with COCOMetric value: 0.0021581666552511966.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@1_COCOMetric=0.002.pth
Better model found at epoch 2 with COCOMetric value: 0.030757115219257607.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@2_COCOMetric=0.031.pth
Better model found at epoch 3 with COCOMetric value: 0.10474126128636625.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@3_COCOMetric=0.105.pth
Better model found at epoch 4 with COCOMetric value: 0.22112907749246885.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@4_COCOMetric=0.221.pth
Better model found at epoch 5 with COCOMetric value: 0.2825155602494337.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@5_COCOMetric=0.283.pth
Better model found at epoch 6 with COCOMetric value: 0.2847437254864597.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@6_COCOMetric=0.285.pth
Better model found at epoch 7 with COCOMetric value: 0.30376697478248543.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@7_COCOMetric=0.304.pth
Better model found at epoch 8 with COCOMetric value: 0.30641991991900114.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@8_COCOMetric=0.306.pth
Better model found at epoch 9 with COCOMetric value: 0.32061489251664255.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@9_COCOMetric=0.321.pth
Better model found at epoch 11 with COCOMetric value: 0.33909078597602477.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@11_COCOMetric=0.339.pth
Better model found at epoch 13 with COCOMetric value: 0.34331049983411116.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@13_COCOMetric=0.343.pth
Better model found at epoch 14 with COCOMetric value: 0.3580120943675082.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@14_COCOMetric=0.358.pth
Better model found at epoch 19 with COCOMetric value: 0.36023581037600066.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@19_COCOMetric=0.360.pth
epoch train_loss valid_loss COCOMetric time
0 0.705941 0.592051 0.373433 38:08
1 0.695365 0.585187 0.383405 38:18
2 0.643945 0.581034 0.384970 38:02
3 0.658664 0.578871 0.387123 38:07
4 0.680404 0.577860 0.386550 38:06
5 0.660106 0.581594 0.386339 38:13
6 0.650884 0.577494 0.394019 38:07
7 0.645906 0.581913 0.384013 38:14
8 0.609002 0.578590 0.388758 38:35
9 0.649965 0.573236 0.398423 38:23
10 0.631171 0.578832 0.386258 39:21
11 0.609935 0.574516 0.398511 38:17
12 0.636813 0.577509 0.389471 38:27
13 0.634071 0.577487 0.390812 38:19
14 0.620185 0.576400 0.393455 38:11
15 0.609771 0.572370 0.392193 38:10
16 0.637317 0.576125 0.395918 38:08
17 0.595624 0.575718 0.400241 38:14
18 0.610124 0.579062 0.392582 38:12
19 0.649167 0.578284 0.388842 38:30
20 0.597645 0.581600 0.387782 38:39
21 0.618592 0.588434 0.379126 38:20
22 0.601891 0.578218 0.391881 38:17
23 0.582451 0.578535 0.394213 38:20
24 0.581206 0.581188 0.389038 38:59
25 0.584891 0.586259 0.379954 38:40
26 0.599408 0.584926 0.383964 38:58
27 0.567105 0.584930 0.385572 38:25
Better model found at epoch 0 with COCOMetric value: 0.37343308376504636.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@0_COCOMetric=0.373.pth
Better model found at epoch 1 with COCOMetric value: 0.3834046237982395.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@1_COCOMetric=0.383.pth
Better model found at epoch 2 with COCOMetric value: 0.38497033866151575.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@2_COCOMetric=0.385.pth
Better model found at epoch 3 with COCOMetric value: 0.38712345369355794.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@3_COCOMetric=0.387.pth
Better model found at epoch 6 with COCOMetric value: 0.3940191636756219.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@6_COCOMetric=0.394.pth
Better model found at epoch 9 with COCOMetric value: 0.3984231287870522.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@9_COCOMetric=0.398.pth
Better model found at epoch 11 with COCOMetric value: 0.3985110730112441.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@11_COCOMetric=0.399.pth
Better model found at epoch 17 with COCOMetric value: 0.4002408533482014.
Backup models/tf_efficientdet_lite0-512.pth as models/tf_efficientdet_lite0-512@17_COCOMetric=0.400.pth
No improvement since epoch 17: early stopping
{% endraw %}

Inference

{% raw %}
if torch.cuda.is_available():
    infer_ds = Dataset(valid_records[:4], inf_tfms)
    infer_dl = efficientdet.infer_dl(infer_ds, batch_size=4, shuffle=True)
    samples, preds = efficientdet.predict_dl(learn.model, infer_dl)
    imgs = [sample["img"] for sample in samples]
    show_preds(
        imgs=imgs[:4],
        preds=preds[:4],
        class_map=class_map,
        denormalize_fn=denormalize_imagenet,
        ncols=1,
        figsize=(36,27)
    )
/usr/local/lib/python3.8/dist-packages/fastai/callback/core.py:50: UserWarning: You are setting an attribute (__class__) that also exists in the learner, so you're not setting it in the learner but in the callback. Use `self.learn.__class__` otherwise.
  warn(f"You are setting an attribute ({name}) that also exists in the learner, so you're not setting it in the learner but in the callback. Use `self.learn.{name}` otherwise.")

{% endraw %}

As you can see, training after only 2 epochs does not produce a usable model.

Saving Final Model Explicitly

Saving it explicitly after all the epochs.

{% raw %}

save_final[source]

save_final(learn:Learner, save_model_fpath:str)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
final_saved_model_fpath = f"models/{backbone_name}-subcoco-{img_sz}-runs-{head_runs}+{full_runs}-final.pth"
save_final(learn, final_saved_model_fpath)
{% endraw %}

Inference w/ Pretrained Model

Load a pretrained model.

{% raw %}
pretrained_model = efficientdet.model(model_name=backbone_name, num_classes=len(stats.lbl2name), img_size=img_sz)
pretrained_model.load_state_dict(torch.load(final_saved_model_fpath))
<All keys matched successfully>
{% endraw %}

Run Inference with first 4 of the validation image...

{% raw %}
if torch.cuda.is_available():
    infer_ds = Dataset(valid_records[:4], inf_tfms)
    infer_dl = efficientdet.infer_dl(infer_ds, batch_size=4, shuffle=False)
    samples, preds = efficientdet.predict_dl(pretrained_model.cuda(), infer_dl)
    imgs = [sample["img"] for sample in samples]
    show_preds(
        imgs=imgs[:4],
        preds=preds[:4],
        class_map=class_map,
        denormalize_fn=denormalize_imagenet,
        ncols=1,
        figsize=(36,27)
    )

{% endraw %}